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ABSTRACT 

The serine recombinases are a diverse family of 
modular enzymes that promote high-fidelity DNA 
rearrangements between specific target sites. 
Replacement of their native DNA-binding domains 
with custom-designed Cys 2 -His 2 zinc-finger 
proteins results in the creation of engineered zinc- 
finger recombinases (ZFRs) capable of achieving 
targeted genetic modifications. The flexibility 
afforded by zinc-finger domains enables the 
design of hybrid recombinases that recognize a 
wide variety of potential target sites; however, this 
technology remains constrained by the strict recog- 
nition specificities imposed by the ZFR catalytic 
domains. In particular, the ability to fully reprogram 
serine recombinase catalytic specificity has been 
impeded by conserved base requirements within 
each recombinase target site and an incomplete 
understanding of the factors governing DNA recog- 
nition. Here we describe an approach to comple- 
ment the targeting capacity of ZFRs. Using 
directed evolution, we isolated mutants of the |! 
and Sin recombinases that specifically recognize 
target sites previously outside the scope of ZFRs. 
Additionally, we developed a genetic screen to de- 
termine the specific base requirements for site- 
specific recombination and showed that specificity 
profiling enables the discovery of unique genomic 
ZFR substrates. Finally, we conducted an extensive 
and family-wide mutational analysis of the serine re- 
combinase DNA-binding arm region and uncovered 



a diverse network of residues that confer target spe- 
cificity. These results demonstrate that the ZFR rep- 
ertoire is extensible and highlights the potential of 
ZFRs as a class of flexible tools for targeted genome 
engineering. 

INTRODUCTION 

In recent years, the ability to introduce highly efficient 
genetic modifications has become an accessible reality in 
the laboratory (1). Advances in genome engineering are 
transforming basic biological research and biotechnology 
by allowing researchers to induce custom alterations into 
virtually any cell type or organism. Site-specific endo- 
nucleases such as ZFNs (2-4), TALENs (5,6) and 
CRISPR/Cas (7,8) have emerged as powerful and 
broadly applicable tools for this process. Nevertheless, 
customizable nucleases are limited by numerous factors 
including potentially mutagenic off-target effects (9,10) 
and reliance on the host cell machinery to induce specific 
genetic modifications (11). Site-specific recombinases 
(SSRs), such as Cre and Flp, are an alternative class of 
DNA-modifying tools capable of performing site-specific 
integration, cassette exchange and chromosomal deletions 
(12). The utility of many site-specific recombination 
systems, however, has been hampered by the strict recog- 
nition specificities of SSRs for their natural DNA targets, 
a byproduct of the essential roles they have evolved to 
perform (13). As a result, application of these enzymes 
has been limited to cells or organisms that contain rare 
pre-existing pseudo-recognition sites (14,15) or target sites 
that have been pre-introduced through time-consuming 
and labor-intensive procedures. In order for SSRs to 
achieve the level of convenience and practical utility 
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Figure 1. Overview of the small serine recombinases. (A) (Top) Crystal structure of the y§ resolvase dimer bound to target DNA (PDB ID: 1GDT) 
(20). 'Left' and 'right' recombinase monomers are colored light and dark teal, respectively. DBD indicates native DNA-binding domain. Linker and 
arm region are labeled for the 'right' recombinase monomer only. (Bottom) Core sequence recognized by the y5 resolvase catalytic domain. Base 
positions are indicated. (B) Sequence alignment of six of the most comprehensively characterized serine recombinase catalytic domains. Conserved 
residues are highlighted light teal. The a-helical and P-sheet secondary structural elements are denoted above the alignment as cylinders and arrows, 
respectively. 



afforded by targeted nucleases, new and adaptable 
methods for the design of variants with flexible recombin- 
ation specificities must be developed (16). 

Engineered zinc-finger recombinases (ZFRs) represent a 
potential solution to this limitation (17,18). ZFRs are 
composed of custom-designed Cys 2 -His 2 zinc-finger 
domains fused to catalytic domains derived from the 
resolvase/invertase family of serine recombinases (e.g. y8 
and Tn3 resolvases, Gin and Hin invertases) (19) 
(Figure 1A). ZFRs recombine hybrid target sites that 
consist of two inverted zinc-finger binding sites flanking 
a central 20-bp core sequence recognized by the recombin- 
ase catalytic domain (21,22). In nature, unique topological 
and spatial constraints are imposed onto these enzymes 
through the presence of multiple binding sites or accessory 
factor proteins that ensure the specificity of the recombin- 
ation reaction (11,19). By using various selection 
strategies, 'hyperactivated' recombinase mutants have 
been identified that allow for unrestricted recombination 
between minimal recognition sequences (23-27). Because 
zinc-finger domains can be assembled to recognize a wide 
variety of unique sequences (28-36), fusion of these 
hyperactivated catalytic domains with custom zinc-finger 
proteins allows design of hybrid recombinases with broad 
targeting capabilities (37,38). Yet ZFR targeting remains 
constrained by sequence restrictions imposed by the re- 
combinase catalytic domain, which requires the presence 
of a complementary 20-bp core sequence. To address this 
limitation, we recently reported the directed evolution 
of an extended collection of Gin recombinase catalytic 
domains capable of recognizing an estimated >10 7 
unique 20-bp core sites (39). These efforts were based on 
mutagenesis of the C-terminal DNA-binding arm (40), a 
region of the recombinase that extends from the central E 



helix and mediates sequence selectivity through specific 
interactions with the DNA minor groove (Figure 1A). 
However, the scope of this technology remains limited 
because of the presence of conserved amino acid determin- 
ants that prohibit complete reprogramming of recombin- 
ase catalytic specificity. In particular, our laboratory (39) 
and others (41^13) have shown that catalytic domains 
derived from the Gin and Hin recombinases have strict 
recognition specificity at base positions 6, 5 and 4; only 
a single A to T substitution at one of these positions is 
tolerated per half-site. Preliminary specificity profiling of 
the Tn3 resolvase has revealed similar base requirements 
at the equivalent half-site positions. 

By using catalytic domains with distinct targeting 
profiles (19), new ZFRs with extended targeting 
capabilities could be created (Figure IB). The (3 and Sin 
recombinases are two members of the resolvase/invertase 
family that recognize core sequences with increased GC 
content at positions 6, 5 and 4 (Figure 1A and Table 1). 
The Sin recombinase, originally isolated from the 
Staphylococcus aureus multiresistance plasmid pI9789 
(44), differs from the Tn3/y8 class of recombinases in 
two major ways: first, it requires the presence of a non- 
specific DNA-binding protein (e.g. Hbsu) (45), and, 
second, it coordinates recombination between two 86-bp 
resH sites that contain two binding sites (45) rather than 
three. The (3 recombinase, isolated from the Streptococcus 
pyogenes plasmid PSM19035 (46), also requires a host- 
encoded accessory factor protein (e.g. Hbsu, HU or 
eukaryotic HMG1 proteins) (47) and recognizes a 90-bp 
target sequence, six, with only two binding sites (48). 
Here, we report the directed evolution of new, activated 
P and Sin recombinases with diverse recognition 
capabilities that significantly expand the targeting 
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Table 1. The prototypical serine recombinases and their incorporation into ZFRs 



Recombiiiase 


Organism 


Native function 


Target 


Core sequence 


Activating 


Used as ZFR 








site 




mutation(s) 




y8 


E. coli 


Resolvase 


res 


CGAA ATA TT AT AA ATT ATCG 


D102Y, E124Q 


N/A 


Tn3 


E. coli 


Resolvase 


res 


CGAA ATA TT AT AA ATT ATCG 


G70S, D102Y, E124Q 


Ref. 17, 18, 21, 37, 40, 54 


Gin 


E. coli 


Invertase 


gix 


CTGT AAA CC GA GG TTT TGGA 


H106Y 


Ref. 18, 37, 38, 39, 40, 49 


Hin 


Phage Mu 


Invertase 


hix 


TCCT AAA CC AT GG TTT AGGA 


H107Y 


Ref. 18 


P 


S. pyogenes 


Resolvase/invertase 


six 


CAAT AGA GT AT AC TTA TTTC 


N95D 


Present work 


Sm 


S. aureus 


Resolvase 


resH 


AATT TGG GT AC AC CCT AATC 


Q87R, Q115R 


Present work 



Dinucleotide cores (e.g. crossover regions) are underlined. Core sequence half-site positions 10-7, 6-4, 3-2, and the dinucleotide core are separated 
by spaces. 



capacity of ZFRs. Additionally, we explore the specificity 
determinants of the resolvase/invertase family of SSRs 
and identify critical residues that could be altered to 
enable the design of recombinases with expanded targeting 
capabilities. 

MATERIALS AND METHODS 

Plasmid construction 

Split gene reassembly plasmids were constructed as previ- 
ously described (49). Briefly, GFPuv (Clontech, Mountain 
View, CA, USA) was polymerase chain reaction (PCR)- 
amplified with the primers 5'-GFP-ZFR-XbaI-Fwd (5'-TT 
AATTAAGAGTCTAGAGGAGGCGTGcaatagagtatact 
tatttcCACGCCTCCAGATCTAGGAGGAATTTAAAA 
TGAG-3') and 3'-GFP-ZFR-HindIII-Rev (5'-ACTGA 
CCTAGAGAAGCTTGGAGGCGTGgaaataagtatactc 
tattgCACGCCTCCCTGCAGTTATTTGTACAGTTCA 
TC-3'), where 'ZFR' corresponds to the specific 20-bp 
core sequences noted throughout this study (sequence 
recognized by p is in lowercase). PCR products were 
cloned into the Spel and Hindlll sites of the split gene 
reassembly vector. The genes for the P and Sin catalytic 
domains were custom-synthesized (Blue Heron, Bothell, 
WA, USA) and fused to the HI zinc-finger protein (18) 
by overlap PCR (Supplementary Table SI). ZFR libraries 
based on these catalytic domains were constructed by 
error-prone PCR as previously described (18,50). Ala 
mutants for the Gin, Tn3 and P catalytic domains were 
generated by mutagenic overlap PCR as described (40). 
ZFR PCR products were cloned into the Sad and Xbal 
sites of the split gene reassembly vector, and library sizes 
were determined to be ~5 x 10 7 . DNA sequencing 
indicated ~3 amino acid substitutions per ZFR catalytic 
domain. All oligonucleotides were obtained from IDT 
(Coralville, IA, USA) 

Recombination assays and selections 

Recombination assays and selections were performed by 
split gene reassembly as described (39,40,49). 

Substrate specificity profiling 

GFPuv was PCR-amplified with the primers 5'-GFP- 
mutantZFR-Xbal-Fwd (5'-TTAATTAAGAGTCTAGA 
GGAGGCGTGnnnnnnnnnatacttatttcCACGCCTCCAG 



ATCTAGGAGGAATTTAAAATGAG-3') and 3'-GFP- 
wtZFR-Hindlll-Rev (5'-ACTGACCTAGAGAAGCTT 
GGAGGCGTGgaaataagtatactctattgCACGCCTCCCTG 
CAGTTATTTGTACAGTTCATC-3'), where 5-GFP- 
mutantZFR-Xbal-Fwd contained randomized base sub- 
stitutions at the 10-7, 6-4 or 3 and 2 base positions 
within the 'left' 10-bp half-site of the 20B or 20S core 
site, and 3'-GFP-wtZFR-HindIII-Rev contained either 
the wild-type 20B or 20S core site (sequence recognized 
by p is in lowercase). PCR products were digested with 
Xbal and Hindlll and ligated into split gene reassembly 
vectors that contained ZFRs with the P-N95D or Sin- 
Q87R/Q115R catalytic domains. Vectors were used to 
transform Escherichia coli TOPI OF (Life Technologies), 
and cells were incubated in super broth (SB) medium 
with 30ug/ml chloramphenicol. After 6 or 16 h, cells 
were plated on solid lysogeny broth (LB) media with 
30(.tg/ml chloramphenicol or 30ug/ml chloramphenicol 
and lOOug/ml carbenicillin, an ampicillin analog. 
Recombination frequency was calculated as the number 
of colonies on chloramphenicol/carbenicillin plates 
divided by the number of colonies on chloramphenicol- 
only plates. Colony numbers were determined by auto- 
mated counting using the GelDoc XR Imaging System 
(Bio-Rad, Hercules, CA, USA). Individual chlorampheni- 
col/carbenicillin-resistant colonies were analyzed by direct 
sequencing (Eton Biosciences, San Diego, CA, USA). 

RESULTS 

Selection of enhanced |5 and Sin recombinase variants 

To incorporate p and Sin into the ZFR architecture, we 
used directed evolution to select for mutations that 
promoted unrestricted recombination between minimal 
20-bp core sequences derived from site I of the native 
six and resH recombination sites (hereafter referred to 
as 20B and 20S, respectively) (Table 1). Similar selection 
strategies have previously enabled the identification of 
hyperactivating mutations for several serine recombinases 
including Gin and Hin (23), Tn3 and y§ (24) and Sin 
(25,26). We note that the serine recombinases promote 
recombination between pseudo-symmetric 20-bp core se- 
quences that consist of two inverted 10-bp half-site regions 
(Figure 1A). We used error-prone PCR to introduce ~3 
amino acid mutations into each catalytic domain and 
fused each library to an unmodified copy of the HI 
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Figure 2. Directed evolution of enhanced p and Sin catalytic domains. (A) Schematic representation illustrating the split gene reassembly selection 
strategy. ZFR variants are shown in various colors; P-lactamase gene is in orange and GFPuv gene is in white. (B) Selection of p and Sin variants 
that recombine minimal core sites from the six and resH recombination sites, respectively. (C, D) Frequency and position of the mutations that 
activate the (C) p and (D) Sin catalytic domains. Highly recurrent mutations are indicated. (E, F) Crystal structure of the activated Sin-Q115R 
tetramer; view of dimer interface from above the N-terminus of the E helix (PDB ID: 3PKZ) (51). Highly recurrent (E) p and (F) Sin mutations 
shown as sticks and mapped onto the rotated Sin dimer, residues labeled on upper monomer only. Sulfate ion shown as spheres. (G) Recombination 
activity of P-N95D and Sin-Q87R/Ql 15R on the 20B, 20S, 20G and 20T core sequences. Recombination was determined by split gene reassembly. 
Error bars indicate standard deviation (n - 3). 



zinc-finger protein (18), which recognizes the sequence 
5'-GGAGGCGTG-3'. All 'wild-type' Sin mutants in this 
work contain the fixed substitution HOOT, which was pre- 
viously shown to enhance Sin-mediated recombination 
(26) but had negligible activating effect in our system. 
We selected active ZFR mutants by split gene reassembly 
(49), a method that links recombinase activity with cell 
survival in the presence of carbenicillin (Figure 2A). 
After only two rounds of selection, we observed a 
> 1000-fold increase in recombination for each ZFR 
library (Figure 2B). After the fourth round of selection, 
we sequenced ~30 clones for each recombinase and 
observed a diverse collection of mutations for both ZFR 
libraries (Supplementary Table S2 and S3). We identified 
32 distinct substitutions for p and 44 unique substitutions 
for Sin. Among sequenced p clones, ~66% contained the 
substitution N95D; >22% contained E71G or M94V; and 
>1 1% contained K70R, M94T or N107S (Figure 2C). For 
Sin, ~82% of all clones harbored Q115R; ~26% con- 
tained V78A; and >13% contained I113T, K84R, D85G 
or K110R (Figure 2D). Among these, Rowland et al. 



previously identified V78A, D85G, K110R and Q115R 
(25). Notably, the majority of the selected mutations clus- 
tered within or near the central E helix and recombinase 
dimer interface (Figure 2E and F). The location of these 
substitutions is similar to those mutations previously 
shown to enhance Gin-, Hin- (23) and Tn3-mediated re- 
combination (24), indicating that a conserved mechanism 
for activation might involve stabilization of the recombin- 
ase synaptic tetramer (52). 

Selected |5 and Sin variants recombine DNA with 
high efficiency 

To determine the extent to which the selected mutations 
promote recombination, we used split gene reassembly to 
evaluate the activity of individual ZFRs composed of 
various p and Sin catalytic domains on the 20B and 20S 
core sequences, as well as on core sites derived from the 
native Gin and Tn3 recombination sites (hereafter referred 
to as 20G and 20T, respectively) (Table 1). We found that 
each selected p and Sin mutant recombined its intended 
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Table 2. Recombination by selected p and Sin catalytic domains 



Recombinase Mutations Core sequence 

20B 20S 20G 20T 

P None — — — 

M94V +++ 
N95D ++++ 
M94T, R104H + 
M94V, N107S ++++ 
V58A, N95D + 
M94T, N95D +++ 
E71G, M94V, N95D ++ 
N68S, E71G, V88A, N95D +++ 
K33R, N49S, E71G, N95D +++ 
R18P, R41P, D55G, R67G, E71G, M94I, N107K, Y114N +++ 

Sin None — — — 

I2V ++ 
Q115R + 
Q87R, Q115R ++++ 
T32A, N97D, Q115R + 
II IV, D12N, V78A, Q115R ++++ 
II IV, D12N, V78A, Q115R, L150P + 
T77I, D85G, K110R, I133V, V138I + 
HIT, V61A, D85G, K110R, I113T, Q115R ++ 
I64A, V78A, K84R, I90V, I113T, Q115R, Q137R ++++ 
V53A, E76G, E83G, D85G, V99A, N102S, I113S, Q115R ++++ 

Symbols indicate recombination efficiency. ++++, >35% recombination; +++, 20-35%; ++, 6-19%; +, 1-5%; -, <0.1%; — , <0.01%. The limit of 
detection of recombination by split gene reassembly is ~10~ 5 %. All Sin variants are derived from the HOOT background strain. 



DNA target > 1000-fold more efficiently than the corres- 
ponding wild-type enzymes (Table 2). One p and four Sin 
clones demonstrated slightly relaxed specificity on 20T, 
while no variants effectively recombined the 20G target 
(Table 2). The p and Sin mutants that showed the strictest 
recognition specificity for their intended DNA targets, 
P-N95D and Sin-Q87R/Q115R, also harbored the most 
prevalent mutation from each library (Figure 2G). 
Intriguingly, in the case of Sin, a single auxiliary substitu- 
tion (Q87R) was selected for only in the presence of 
Q115R. Cross-comparative specificity analysis between 
these two clones revealed that Sin-Q87R/Q115R, but not 
P-N95D, recombined the 20S and 20B target sites with 
comparable efficiencies (Figure 2G), indicating that 
P-N95D exhibits more stringent recognition specificity 
than Sin-Q87R/Q115R. 

Specificity profile of the p recombinase 

To develop a more detailed understanding of the factors 
underlying serine recombinase substrate recognition, we 
evaluated the specificity profiles of the p and Sin catalytic 
domains. To accomplish this, we adapted our split gene 
reassembly selection method to identify the specific bases 
tolerated by each recombinase at every position within 
their respective 10-bp half-site regions (Figure 3 A). 
Previous studies with the Gin recombinase revealed a 
pseudo-modular recognition pattern within each 10-bp 
half-site; recognition was segmented into four discrete 
regions (e.g. non-specific base recognition at positions 
10, 9, 8 and 7; strict specificity at positions 6, 5 and 4; 
specific recognition of positions 3 and 2; and non- 
specific recognition at the dinucleotide core) (39). Based 



on these findings, we constructed a series of mutant 
20B and 20S substrate libraries that contained fully 
randomized base combinations within three of the four 
half-site sub-domains (i.e. positions 10-7, 6-4 and 3 and 
2) (Figure 3B). To ensure efficient recombination, we 
elected not to introduce substitutions within the central 
dinucleotide core (i.e. the region in which crossover 
takes place between compatible 2-bp overhangs). 
Therefore, to maximize the effectiveness of our selection 
system, we introduced mutations only within a single 
10-bp half-site region (Figure 3B). This approach 
facilitated straightforward retrieval by DNA sequencing 
of all tolerated/recombined core sites. 

We evaluated the ability of P-N95D and Sin-Q87R/ 
Q115R to recombine DNA substrate libraries at two 
time points: 6h and 16 h. After 6 or 16 h of incubation 
in liquid culture, we subjected cells harboring the library 
members to antibiotic selection on LB agar plates, 
followed by sequencing of individual transformants to (i) 
ensure that recombination had occurred and (ii) identify 
tolerated recombination substrates. As anticipated, we 
observed that ZFRs that were allowed to recombine 
DNA for only 6 h demonstrated greater recognition strin- 
gency than those allowed to react for 16 h (Figure 3C and 
D). Previous work by our laboratory indicated that 6-h 
incubation is sufficient to allow for high levels of ZFR- 
mediated recombination to occur (49). We sequenced 30 
clones for each P-N95D substrate library and 10 clones for 
each Sin-Q87R/Q115R substrate library. Both enzymes, 
regardless of incubation time, yielded outputs that 
converged on the sequence motif GT at positions 3 and 
2 (Figure 3E; data for P-N95D shown only). We also 
observed strong convergence toward G at position 4 for 
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Figure 3. Specificity of the (3-N95D catalytic domain. (A) Schematic representation illustrating the genetic screen used to profile recombinase 
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white. (B) Randomization strategy used for specificity profiling. Randomized bases are boxed. Note that only 'left' half-site of the upstream ZFR 
target site contained base substitutions. (C and D) Recombination by (C) p-N95D and (D) Sin-Q87R/Ql 15R for each 20B and 20S core site library, 
respectively, at 6 and 16 h. (E) Number of selected base sequences (out of 30) at each position within the 20B half-site. Thirty clones were sequenced 
from each 6-h library output. Recombination was determined by split gene reassembly. Error bars indicate standard deviation (n = 3). 



both catalytic domains. Although no strong consensus 
was observed after 16 h at positions 10, 9, 8, 7, 6 or 5 
for (5-N95D, substrates incubated for 6h showed 
sequence convergence at 7 of 9 positions (exclusion of 
thymine at position 7 is an artifact of the system, as its 
presence allows for the introduction of stop codons within 
the 20-bp core) (Figure 3E; data for P-N95D shown only). 
In particular, P-N95D recognition at position 6 was fully 
degenerate, and the enzyme had a strong preference for A 
or G at position 5. To a lesser degree, we also observed 
variability at positions 9 (A or G) and 10 (T >> 
C > A = T). Interestingly, the consensus sequence 
derived for P-N95D shares only 60% sequence identify 
with the native core sequence recognized by the wild- 
type p recombinase (Table 3); however, both target sites 
maintain a strong preference for A and G bases. 

Although we constructed the Sin substrate libraries in 
a manner identical to those for p, we were unable to 



conclusively determine its specificity profile due to 
repeated selection of a single, potentially artifactual con- 
sensus sequence (data not shown). Based on these 
findings, we focused subsequent studies on the P-N95D 
catalytic domain. 

P-N95D-mediated recombination of core sequences from 
the human genome 

To determine the accuracy and utility of the P-N95D spe- 
cificity profile, we next investigated whether ZFRs that 
contained P-N95D could recombine pre-determined 
20-bp core sites from the human genome. Based on our 
earlier findings, we derived a degenerate 20-bp core site 
and identified 17 recombination sites from several thera- 
peutically relevant human genes including breakpoint 
cluster region (BCR), achromatopsia) cyclic-nucleotide 
gated ion channels 2 (CNGA3) and 3 (CNGB3), factor 
VIII, factor IX and retinal pigment epithelium (RPE65) 
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Table 3. P-Mediated recombination of core sequences derived from the human genome 



Target site Gene Core sequence Recombination (%) 
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GTCA CCA GT TT AC 
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<0.01 
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Dinucleotide core composition is denoted in target site (e.g. P-AT 1 contains an AT dinucleotide core). Positions 10-7, 6-4, 3-2, 
and the dinucleotide core are separated by spaces. Base mismatches between genomic and wild-type core sequences are 
underlined. Recombination was measured by split gene reassembly. ND indicates not determined. Error values indicate 
standard deviation (n = 3). Abbreviations for nucleotide substitutions are as follows: N = A, T, C, or G; V = A, C, or G; 
B = T, C, or G; R = A or G; Y = T or C; W = A or T. 



genes (GRCh37.pl0 primary reference assembly) 
(Table 3). The dinucleotide core sequence within our de- 
generate recombination site was restricted to WW base 
combinations (i.e. AA, AT, TA or TT). This restriction 
was based on previous findings indicating that conserva- 
tive base substitutions are tolerated by the Gin and Tn3 
recombinases at the dinucleotide core (39). The selected 
recombination sites displayed varying degrees of sequence 
similarity to the wild-type core sequence (Table 3). We 
flanked each 20-bp genomic core site with zinc-finger 
binding sites for the HI zinc-finger protein (18) and 
evaluated recombination by split gene reassembly (49). 
We found that P-N95D effectively recombined 6 of 17 
(~35%) target sites, with recombination efficiencies from 
3 to 95% (Table 3). Surprisingly, we found that only sites 
containing an AT dinucleotide core were recombined, 
indicating that (3-N95D, unlike the Gin and Tn3 recom- 
binases, might exhibit strict dinucleotide core specificity. 
The highest level of recombination was observed on target 
P-AT 1, which shared the greatest degree of sequence 
similarity with both the wild-type and consensus recom- 
bination sites. 

Mutational analysis of the serine recombinase arm region 

While substrate specificity profiling provides insight into 
the base requirements for site-specific recombination, the 
relative importance of the specific amino acid residues 
that mediate these interactions remains largely unknown. 
Crystal structures of the y8 resolvase dimer in complex 
with its target DNA have revealed that extensive 
protein-DNA contacts between the C-terminal arm 
region and the DNA minor groove dictate recombinase 
recognition (20) (Figure 1A); however, a more detailed 



understanding of these interactions is required to facilitate 
reprogramming of recombinase specificity toward diverse 
target sites. To better understand the factors that confer 
target specificity, we used alanine-scanning mutagenesis 
(53) to investigate the role of each arm region residue in 
recombination. We took a family-wide approach, target- 
ing the DNA-binding arms of three functionally distinct 
and hyperactivated recombinases: Tn3-G70S/D102Y/ 
E124Q (Argl20 through Argl43), Gin-H106Y (Glull7 
through Prol41) and P-N95D (Ilel25 through Hisl47). 
We introduced Ala substitutions into every arm position 
for each catalytic domain and fused each mutant to the H 1 
zinc-finger protein. Native Ala residues were substituted 
with Gly. We evaluated the ability of each variant to re- 
combine its intended 20-bp core sequence by split gene 
reassembly. 

For each recombinase, we identified a network of 7-10 
residues indispensable for catalysis (i.e. a > 100-fold reduc- 
tion in recombination was observed on Ala or Gly substi- 
tution) (Figure 4). This network consisted of both 
evolutionarily conserved residues, presumably important 
for non-specific DNA-binding, and variable residues that 
likely contribute to specific target recognition. Conserved 
residues, numbered according to Tn3, are Ilel22, Argl25, 
Thrl26, Gly 129, Lysl34, Glyl37 and Glyl41. Among non- 
conserved residues essential for recombination, we observed 
substantial positional variation. These residues are Argl30, 
A&133, Ilel38 and Phel40 for Tn3 (Figure 4A); Glull7, 
Ilell9 and Leu 127 for Gin (Figure 4B); and lie 125, Lysl37 
and Phel42 for p (Figure 4C). Mapping of the residues 
critical for Tn3-mediated recombination onto the crystal 
structure of the closely related y§ resolvase dimer revealed 
that each essential position is either directly in contact with 
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Figure 4. Alanine-scanning mutagenesis of the serine recombinase arm region. (A-C) Recombination activity of mutant (A) Tn3, (B) Gin and (C) (3 
catalytic domains on their native and minimal DNA targets. Asterisk indicates <0.0001% recombination. Dotted lines indicate threshold below 
which mutants were considered non-functional. (D) Crystal structure of the yb resolvase arm region (sticks) in contact with substrate DNA (gray 
surface). Conserved and variable residues important for recombination are shown in red and purple, respectively. Inert residues are shown in yellow 
(PDB ID: 1GDT) (20). (E) Recombination by a Gin chimera substituted with residues predicted to impart specificity onto the 20T core site. 
Recombination was determined by split gene reassembly. Error bars indicate standard deviation (n = 3). 



or located in proximity to target DNA (Figure 4D). In par- 
ticular, this analysis suggests that evolutionarily conserved 
residues associate with the DNA phosphate backbone, 
whereas those residues implicated in specific DNA recogni- 
tion pack into the interior of the DNA minor groove. To 
directly test whether variable arm residues coordinate re- 
combinase specificity, we attempted to switch the catalytic 
specificity of the Gin recombinase to that of Tn3. For this, 
we introduced the four variable residues speculated to be 
essential for specific recognition by Tn3 (Argl30, Ilel38, 
Lysl39 and Phel40) into the analogous locations within 
the Gin catalytic domain (L127R, R135I, I136K and 
G137F). We evaluated recombination by this chimera on 
both the 20T and 20G ZFR target sites. Remarkably, this 
Gin mutant displayed switched specificity, demonstrating a 
> 1000-fold preference for the Tn3 target over the Gin target 
(Figure 4E). Unlike an earlier Gin mutant that was evolved 
to recognize 20T and contained the substitutions M124S, 
L127R, R131I, G137M and P141R (40), this chimera was 
generated entirely by rational methods, guided by experi- 
mental data, and used distinct residues not previously 
suspected to contribute to catalytic specificity. 



DISCUSSION 

Here we describe the directed evolution of novel p and Sin 
variants that freely recombine minimal 20-bp core sites 
derived from the six and resH site I recombination target 
sequences. Two selected variants, (3-N95D and Sin-Q87R/ 
Q115R, recombined their intended DNA targets with high 
efficiency and specificity. These results support the use of 
selection by split gene reassembly for the rapid identifica- 
tion of hyperactivating mutations for the serine recombin- 
ases. Cross-comparative analysis revealed that (3-N95D 
would likely be superior to Sin-Q87R/Ql 15R for targeting 
applications based on its ability to discriminate the closely 
related 20S target site. This finding, as well as our inability 
to profile the specificity of Sin-Q87R/Q115R, suggests that 
Sin may not be an ideal enzyme for incorporation into 
ZFRs intended for highly specific genome engineering. 
However, it may be a promising candidate for applications 
that require 'generalisf catalytic activity (18,49,54) such as 
non-specific gene transfer. 

Selection-based screening revealed the complete specifi- 
city profile of P-N95D and led to the derivation of a 
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consensus target site, which enabled the identification of 
pseudo-recognition sites present in the human genome. 
These findings indicate that (3-N95D can recombine thera- 
peutically relevant DNA targets and that substrate speci- 
ficity profiling is an effective tool for identifying ZFR 
recombination sites within the human genome. Toward 
a more complete understanding of the mechanisms gov- 
erning serine recombinase recognition specificity, we also 
performed an extensive family-wide mutational analysis of 
the serine recombinase DNA-binding arm region, which 
indicated that recombinase catalytic specificity is the 
product of a network of evolutionarily conserved and 
variable arm positions. 

To our knowledge, P-N95D is the first p recombinase 
variant capable of catalyzing unrestricted recombination 
between minimal crossover sites derived from the native 
six target site. Rowland et al. previously identified 36 sub- 
stitutions that bypass or disrupt the functions of the Sin 
regulatory tetramer, enabling Sin-mediated recombination 
on various ^//-derived crossover sites (25). In particular, 
four mutations— T77I, V78A, K110R and Q115R— 
promoted recombination to near completion. Further, 
Q115R facilitated recombination in the presence of the 
inhibitory mutation R54E, suggesting that Q115R might 
be the most strongly activating mutation of the group. The 
increased range of sensitivity afforded by split gene re- 
assembly allowed us to more accurately rank the varying 
levels of activation achieved by each substitution, reveal- 
ing the following hierarchy: Q115R > V78A > Kl 10R 
>> T77I. Although highly active on its intended DNA 
target, we also found that Sin-Q115R displayed low 
levels of non-specific recombination on several non- 
cognate core sites, including those derived from the Gin 
and p target sequences. Profiling of our activated Sin re- 
combinase population revealed that the Sin-Q87R/Q115R 
double mutant displayed similarly high levels of activity 
with reduced levels of non-specific recombination. Q87R 
had not been identified in previous screens, indicating that 
its selection might be contextually dependent on the 
presence of Ql 1 5R. 

The majority of the activating mutations selected in 
this study lie within the E helix, and more specifically 
within the recombinase dimer interface. Similarly, 
the hyperactivating mutations for the Hin (H107Y), Gin 
(H106Y), Tn3 (G70S, D102Y and E124Q) and y§ 
(D102Y) recombinases are also located in this region, 
indicating a conserved mode of action for these enzymes 
(52). Keenholtz et al. recently solved the crystal structure 
of the Sin-Q115R catalytic domain in the absence of sub- 
strate DNA (51). These studies revealed that the activated 
Sin tetramer is stabilized by two Argll5 residues present 
on adjacent recombinase monomers, suggesting that 
activating mutations facilitate recombination either by 
destabilizing the dimeric configuration or by stabilizing 
the tetrameric conformation. Favorable stacking inter- 
actions between arginine residues have been proposed as 
a possible mechanism for tetrameric stabilization (51), 
although this does not explain the role of the Lys or Cys 
residues also observed at this position among 
hyperactivated Sin variants (25). Detailed examination 
of the crystal structure also revealed the presence of a 



negatively charged sulfate ion bound by the Argll5 
residues from one subunit of each rotating dimer (51). 
This sulfate ion is in proximity to nearly all of the 
selected p and Sin activating mutations. When docked 
with the substrate DNA from the DNA-bound y8 
resolvase dimer structure (20), the sulfate ion mapped 
nearly perfectly onto the scissile phosphate of the sub- 
strate DNA (51), indicating that activating mutations, 
such as Sin Q115R, may function by stabilizing the 
active tetramer geometry, leading to a persistent 'ready' 
conformation that facilitates elevated levels of catalysis. 
Recent work has indicated that activating Sin mutations 
promote DNA cleavage rather than stabilize the cleaved 
DNA product and that Sin catalytic activity is tightly 
coupled with its oligomerization state (55). The absence 
of structural information for p makes it difficult to assess 
the exact role that N95D has on P-mediated recombin- 
ation. However, mapping of this mutation onto the Sin- 
Ql 15R crystal structure reveals that N95D is in proximity 
to the sulfate ion, indicating that this substitution might 
promote recombination through a mechanism similar to 
Sin-Q115R. 

Based on the degenerate P-N95D recombination site 
used to identify pseudo-target sites, we estimate that p 
could recombine ~6.7 x 10 7 unique 20-bp core sequences 
in the context of our ZFR platform. This complements 
our previous work with the Gin recombinase catalytic 
domain (39), which was estimated to recombine 
~3.77 x 10 7 distinct core sites. Combined with our 
archive of >45 pre-selected zinc-finger domains 
(28,30,31), we estimate that ZFRs based on P-N95D 
could be generated to recognize nearly 20000 unique 
44-bp DNA sequences with even greater targeting 
capacity anticipated through the future development of 
p-N95D TAL effector recombinases (56). Notably, our 
studies revealed that only pseudo-recognition sites that 
contained an AT dinucleotide core could be effectively 
recombined by P-N95D. While this could indicate a po- 
tential limitation with regard to controlling the direction- 
ality of integration, additional studies are required to 
investigate this unique feature and develop a complete 
understanding of substrate recognition by P-N95D. In 
particular, more comprehensive methods for specificity 
profiling that consider the context-dependency of base 
substitutions will improve our knowledge of ZFR target 
recognition and should lead to the design of ZFRs with 
enhanced targeting capabilities. We previously showed 
that Gin recombinase catalytic specificity could be re-en- 
gineered toward a broad collection of unnatural core sites 
(39). Similar studies are required to determine whether 
p-N95D catalytic specificity is also re-programmable. 
Currently, the ability of P-N95D to recombine core sites 
beyond the scope of existing ZFRs suggests the possibility 
of targeting a subset of previously inaccessible genomic 
target sites. Several previous studies have demonstrated 
that p is capable of catalyzing site-specific recombination 
in mammalian cells with pre-introduced copies of the 
native six target sites (57-60). Further studies are 
required to determine whether ZFRs based on P-N95D 
are capable of catalyzing targeted integration into en- 
dogenous genomic loci. 
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Finally, alanine-scanning mutagenesis of the entire arm 
regions of the Gin, Tn3 and (3 catalytic domains revealed 
the identities of residues essential for recombination. Past 
mutational studies have focused on identifying residues 
that directly participate in catalysis (61,62). Our analysis 
instead was aimed at locating key residues that confer 
target specificity. We suspect that highly conserved 
residues, such as He 122, Arg 125, Thr 126, Gly 129, Lys 
134, Gly 137 and Gly 141 (numbered according to Tn3), 
likely contribute critical, but largely non-specific, inter- 
actions with DNA, whereas variable positions mediate re- 
combinase specificity. The information gathered from this 
study allowed us to accurately predict a set of four Gin 
arm residues that, on substitution with the corresponding 
Tn3 residues, resulted in a chimeric Gin recombinase with 
specificity switched from that of Gin to Tn3. Together, 
these data indicate that a network of variable arm 
residues coordinates resolvase/invertase target recognition 
and that mutagenesis of these essential residues is an ef- 
fective approach for altering recombinase catalytic speci- 
ficity. These data suggest that future directed evolution 
studies should be focused on these positions, as they 
may be responsible for the emergence of the novel recog- 
nition features among these enzymes. Intriguingly, our 
data also suggest that recombinase specificity might be 
more evolutionarily flexible at core positions 3 and 2, as 
structural analysis indicates that the majority of the 
variable positions implicated in mediating specificity are 
positioned near these bases. In contrast, those residues 
that contact positions 6, 5 and 4 were observed to be 
highly conserved Gly, Arg or Tyr residues, indicating 
that these positions may not be as amenable to redesign. 
The ability of (3 to recognize GC content at these pos- 
itions, compared with the AT-restricted recognition of 
Gin and Tn3, expands the ZFR toolbox and presents 
the opportunity for extended evolution and development 
of the platform. Future studies aimed at investigating co- 
operative effects among arm residues and their relation- 
ship to those residues that participate directly in catalysis 
should guide the design of smarter libraries and facilitate 
the generation of new recombinases with extended speci- 
ficity. In particular, by combining our knowledge of the 
degeneracy of substrate sequence tolerance with our 
ability to intelligently target the appropriate specificity 
determinants for directed evolution, we should be able 
to generate highly efficient and specific ZFRs for virtually 
any genomic target, including safe-harbor sites and 
specific disease loci. 
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